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Abstract 

In the recent years, the globalization and massification of video education offer involved more and more 
eLearning scenarios within universities. This article refers to interactive video and proposes an overview of it. 
We analyze the background information, regarding the eLearning campus used in virtual universities around the 
world, the MOOC movement in the last year, and the related interactive video platforms in the (education) field. 
At the same time, we pay particular attention to technical aspects of the interactive video: defining concept, types 
of video metadata, media fragments and types of annotations, as primordial elements that bring interactivity. We 
tested some free and commercial interactive web application. We gathered all the ideas. We propose a framework 
for an interactive system web based on the main modules: video resource management (production, transcoding 
and storage), annotations, Linked Open Data, distribution medium, player interface, data analytics and 
recommendation system. On the way, we offer our findings, together with our recommendations for an 
annotation interface and player module. It is our idea for Politehnica University Timisoara, either as a standalone 
solution or a complement to actual virtual campus (http://cv.upt.ro) depending on future development plans and 
financial aspects. 

Keywords: annotation, fragment, interactive, metadata, player, transcoding, video 

1. Introduction 

1.1 Digital Native 

Advancement in science, eLearning field and technology has gradually changed our personal life and society. 
“Today’s students are no longer the people our educational system was designed to teach” (Prensky, 2001). 
Students have unlimited and unrestricted access to information and a different approach to work and learning 
(Tapscott, 1999). They were bom in a digital age and technologies are an integral part of their lives. They acquire 
information in a faster manner, using multiple sources. They are surrounded by digital technologies and spend a 
lot of their time watching television, surfing the Internet, playing games, using mobile phones, etc. (Yong & 
Gates, 2014). The literature classifies students in Generation Y or Digital Natives, birth years 1977-1994 and 
Generation Z or Net Generation, birth years 1995-Present (Kinash, Wood, & Knight, 2013). They have a 
“hypertext mind”, “leap around” (Prensky, 2001), “parallel cognitive stmcture and not sequential” (Yong & 
Gates, 2014). They are characterized as multitask, openness to share content (Oblinger & Oblinger, 2005), 
random access, function best when networked (Yong & Gates, 2014), constant connectivity, speed in delivery of 
information (Prensky, 2001), unique attitude towards education (Corrin, Bennett, & Lockyer, 2010). There are 
many approaches; each one differs in the manner researchers use it. Nevertheless, in generals terms are used 
interchangeably (Jones & Shao, 2011). In according to students’ needs, teachers must know how to grasp 
students’ attention and interest in and after the classroom. Digital native students spent over thousands of hours 
watching television and communicating through emails, cell phones and instant messaging, and less to reading 
books (Prensky, 2001). Students are both consumers and creators of electronic media material (Torres & Ross, 
2014). Understanding this is “vitally important” (Teo, 2013) to allow teachers to build new materials, to improve 
their skills, to use new technologies (Yong & Gates, 2014). Modem educational concepts and universities must 
align to these goals, which is why the learning program and online infrastmcture were been improved since their 
first appearance until present days. Contemporary eLearning involve virtual educational environments, 
(interactive) video lectures and new (video) platforms like MOOCs or interactive based. There are used widely 
and seem to be well regarded (Jones & Shao, 2011). 
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1.2 Major ELearning Platforms 

Management systems of the educational content (CMS, LMS, LCMS, and VLE) provide an interaction area for 
students and tutors and also methods of delivering the educational content consisting of written materials, audio 
recordings of lectures and video sequences. All these come to join some current technologies such as social 
networks, the streaming, podcasting, audio-video conferences, forum and blogs (Clark & Mayer, 2011). 
Prestigious universities in the world have chosen the partial or full integration of these technologies on their 
educational platforms. The results of studies made by Ermalai (2011) and Onita (2011) reveal the major 
platforms used by universities: 

• Blackboard in Austria (Salzburg University), in Canada (British Columbia University), in Japan (City 
University of Hong Kong), in Romania (Spiru Haret University from Bucharest, loan Cuza University 
from Iasi), in Sweden (Stockholm University), in UK (University of Manchester), in USA (University of 
Princeton, University of Drexel, Purdue University, University of Kent, Lesley University); 

• IntraLeam in USA (Boston University); 

• Moodle in Australia (Open University), in Canada (Alberta University, Athabasca University), in China 
(Open University), in France (Sophia-Antipolis University, Descartes University), in Finland (Helsinki 
University), in Romania (Politehnica University of Timisoara, Politehnica University of Bucharest, 
Technique University of Cluj-Napoca, Medicine Faculty from Timisoara, Credis University Bucharest, 
Carol 1 University Bucharest, Vasile Goldis Faculty from Arad, Economics Faculty from Oradea, 
Transilvania University of Brasov, Economics Faculty from Bucharest, Stefan Cel Mare University from 
Suceava, Maritime University from Constanta), in Serbia (Belgrade University), in UK (Open University), 
in Korea (OU Korea University), in USA (Cornell University, University of Harvard, University of 
California); 

• Sakai in USA (University of Stanford, University of Cambridge, John Hopkins University, Massachusetts 
Institute of Technology, University of Yale, Berkley University); 

• uPortal in USA (University of Minnesota, Duke University). 

The video is an element of information, teaching and communication and it is measurably in above world 
universities online platforms and become the primordial factor in the new size of modem eLearning called 
MOOCs (Zahn, Krauskopf, Kiener, & Hesse, 2014). MOOC is an acronym for Massive Open Online Course and 
has characteristics like a huge number of students everywhere over the globe - scalability, anyone can participate; 
delivery of content is based on Web video-based platforms; the material are courses with a particular design, 
credits, period (Yuan & Powell, 2013). The main actors in the field are listed in Figure 1 (Onita, Mihaescu, & 
Vasiu, 2015). 


OpenLeaming 

Coursera MRUniversity 
Udemy iversity edX 

Acade me Network 

Standford AHSOII OpenHPI 

FutureLeam Opeii2Stucl\ 

TedEd P2PU MOOEC 

Novoed MiriadaX Unimooc 

Eliademy Udacity Canvas 

Khan Academy Creative 

Codecademy Veduca 

iDESWEB 


Figure 1. MOOCs (generated with http://worditout.com) 


“Measured by student numbers, the top five MOOC providers are Coursera with 10.5 million registered students, 
edX with 3 million, Udacity with 1.5 million, the Spanish-speaking MiriadaX at 1 million, and UK-based 
FutureLeam with 800,000 students. Measured by course distribution, the top MOOC providers in 2014 were 
Coursera, edX, Canvas Network, MiriadaX, FutureLeam, Udacity, CourseSites, iversity, Open2Study, and 
NovoEd. While 80% of the MOOCs were taught in English in 2014, they were also taught in 12 other languages” 
(Education Dive, 2014). 
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1.3 Romania s Online Education 

We think that online education has come long away, and that is reflected in Romanian eLearning field as well. 
We listed some online education platforms used in Romania in above section and from Ermalai (2011) and Onita 
(2011) studies we complete with: AeL in Carol I University from Bucharest and West University from Timisoara; 
IBM Lotus Learning Space in University of Medicine and Pharmacy of Targu Mures; MEDIAEC in Al. I. Cuza 
University from Iasi; Microsoft in Babe§-Bolyai University from Cluj and Grigore Popa University of Medicine 
and Pharmacy from Iasi; NESSI in West University from Timisoara; Run CMS in National Defence Carol I 
University from Bucharest and SNSPA/UVA in Communication and Public Relation Faculty David Ogilvy from 
Bucharest. The local evolution followed the global movement: from postal correspondence study to online 
university courses based on (L)CMS (e.g., Moodle, AeL, Blackboard, Microsoft, Nessi, Mediaec) and MOOC 
incoming (Figure 2) with cautious approaches in multiple domains of interests like: basis of accounting, course 
for women, critical thinking in today’s communication, educational archives, English for beginners, digital 
library, programming, hairdressing, homeopathic, mathematics, data analytics, IT, personal development, 
preschool education, financial, theology (Mihaescu, 2014). 


www.ecursuri.ro/cursuri-onlinewww.cursuri-online.info 
online.drjurj.ro mate.deeetal.ro 
iversity.org/en/courses/critical-thinking-in-today-s-communication 

www.fluent.rowww.eurocor.ro 
www.infoarena.ro/arhiva-educationala 
math-pdr.com mooc.ro 
www.zfelearning.ro/cursuri 

www.digibuc.ro/colectiiwww.bibliotecapemobil.ro 
www.timsoft.ro/index.php?pagina=cursuri 
www.bazelecontabilitatii.ro 
www.academiaonline.rowww.catalog-cursuri.ro 

www.youtube.com/user/gospodinamd 

www.didactic.ro/resurse-educationale 

www.biblacad.ro/UPCmeniu.html 
www.bibnat.ro/Biblioteca-Digitala-s89-ro.htm 
scolispeciale.edu.ro www.zcplus.ro 
romania.haironline.eu/Cursuri.html 

www.teologie.eu/calendar/arhiva/suporturi-de-curs 


Figure 2. Romanian MOOCs approaches (generated with http://worditout.com) 


The general online (video) educational context offers encouraging peculiarities. The Romanian Ministry of 
Education “announced a national digitization initiative that will digitize all of Romania’s educational content by 
the 2017-2018 school years” (Adkins, 2013). The first phase was started in 2014. In 2015 “the innovative 
teaching methods and techniques” will be introduced in 67% of Romanian institutions according to Sursock 
(2015) study. Adkins (2013) sustain that the (mobile) learning will grow to 50% until 2017, and the growth rate 
of self-paced learning will be 40% in 2015. Romanians (according to a report of the National Audiovisual 
Council) are large consumers of video materials (CNA, 2007). The National Strategy Digital Agenda Romania 
points out that interactive visual materials and additional resources from Internet will increase the learner’s 
engagement until 2020. National Education Law no. 1/2011 known as the Virtual School Library talks about new 
and challenging eLearning platforms with lesson from all curricula, dynamic elements, self-evaluation test, video 
lesson and open educational resources. In terms of IT infrastructure, studies show that Romania is placed in the 
top 20 countries using an average speed Internet connection (Akamai, 2013), and the rate of penetration of the 
high-speed Internet connection (over 4Mbps) is over 79%. It is followed the European idea to “connect every 
school, ideally including connectivity to individual classrooms, to broadband, to upgrade the ICT equipment, and 
develop accessible, open national digital learning repositories using structural and investment funds by 2020” 
(Voicu, 2015). It is easy to extract the idea that Romania is well prepared in terms of an IT base and eLearning 
perspectives for the integration of video materials and new video platforms. 

2. Interactive Video 

2.1 Definitions 

Collins, Neville and Bielaczyc (2000) emphasize in a study conducted in 2000 that the video predominant 
systems are the most attractive, convincing and engaging media elements, with much stronger impact than static 
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images and strictly textual information. On the other half, a Cisco study reveals the fact that the transfer of video 
material will be 73% of all Internet traffic by 2017 (Cisco, 2013). These literature observations, the continues 
trend of improving video (educational) content, the infrastructure of MOOCs and world universities online 
campus ensure the premises of a complete video experience with a productive interaction between video itself 
and information related to the images, theme and concepts presented in the video. It is the way for generating 
interactive video elements. Literature defines interactive video as: 

• One of the most exciting types of media, combining the power of moving images, the story of the video, the 
depth and wealth of the information enriched by interactivity (Chen, 2012); 

• video or hyper video, an improved video material by various methods with interactive elements that 
provide a non-linear way to transmit information, similar to the World Wide Web hyperlinks (Petan & Vasiu, 
2013); 

• A convergence of interactive television with the Internet that brings a lot of benefits in areas like eLearning 
and business (Lytras, Lougos, Chozos, & Pouloudi, 2002). 

In practice and from an instructional point of view, adapting a classical course material requires redesigning the 
course and restructures its content, leading to increased video production costs for such platforms (Jermann, 
Bocquet, Raimond, & Dillenbourg, 2014). Video interactivity, not widely implemented until present, come as a 
current, complement to the educational platform, and it provides “depth information” and diversity, the extra 
resources opportunity going from a central type video element. Moreover, it is noted by researchers as Zahn, 
Krauskopf, Kiener and Hesse (2014) or Jensen (2008) that the classical video type display will generate apathy 
in education, instead of active learning activities, the main purpose of digital information transmission process. 
Studies in the literature calls for an evolution of videos beyond “passive unidirectional TV type experience” in 
order to facilitate collaborative processes, directing the attention of students, questioning students online, and 
gradual transition from one stage of learning to another (Pea, 2006), resulting in increasing the level of interest 
and personal satisfaction from the student in relation to the contents (Marchioria, Blanco, Torrente, 
Martinez-Ortiz, & Fernandez-Manjon, 2011). 

2.2 Technical Factors 

Technically speaking, in the context of creation and distribution interactive video materials over the Internet, a 
primordial factor is video metadata. Metadata is structured information that “describes, explains, locates, or 
otherwise makes it easier to retrieve, use or manage an information resource” (Niso Press, 2001). This is the 
descriptive part of an individual video clip, in addition to a just replay of frames. It can be generated either 
manually by a human creator or automatically by various type of video processing. The information thus 
generated describes the current video material, and it can be used to identify additional information associated 
with the themes appearing in the video. In terms of content and concepts presented in a clip, a video material was 
a black box. From the point of view of the web browser, a video clip encapsulated this way within the page is 
completely opaque. From a technical standpoint, metadata can be stored in the video files, in particular, fields, 
defined by the existing standards or it can be saved in dedicated systems of management - databases. In the first 
case, metadata is stored in the header of the video, in the same file; the available fields for metadata are given by 
the structure of the video standard that is used. This option has the advantage of having metadata associated 
directly with the referred video, but access to metadata is slower, and it is requiring processing the video, which 
size is usually large. A much faster alternative is storing the metadata in dedicated management systems with 
faster response time to queries but having the disadvantage of being separated from the referred video files. 
Video materials that possess relevant descriptive metadata become visible to the user who is looking for accurate 
information. There are several types of metadata: 

• Technical metadata-Metadata obtained through automatic analysis of the video. They are a particular type 
of administrative metadata describing properties of digital video, format, compression rate, audio-video 
codec, resolution, bitrate, file size, video length, information on the equipment used to capture material; 

• Descriptive metadata-describe a resource to discover and identify, it creates a summary of the content of 
the video. The type of the information recorded is particular to make an interoperable collection. Metadata 
is manually entered by the manufacturer or by ordinary users: video title, description, category, tags, 
associating related videos, secondary annotations; 

• Administrative metadata-provides access, stores and helps organize the digital collection. The information 
provided does not directly describe the resource itself but provides information to help manage it, 
Copyright for example. 
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These types of metadata are completed with accurate information of interactivity related to annotations, 
decisions and linking the inter-clips, as well as usability information (e.g., viewing statistics, viewed clips, 
chosen decisions, decisions annotations). 

In the process of creating rich experiences through interactive video and adapting video to World Wide Web 
paradigm, the concept of correlation of additional information is impossible to avoid. A regular website usually 
contains hyperlinks to other sections from the same page or to other pages within the same site, and also to the 
web pages located at other sites and on other servers than the local one. The hyperlinks can also indicate other 
pages, images, Linked Data resources or multimedia materials. Similarly, an interactive video system can be 
considered as a video system containing references to information that is both within the same clip as well as to 
other video and multimedia resources managed on the same server, or towards other information from outside. In 
the first case, the one where the reference is made to another subsection of the same video clip we can use the 
notion of video fragment similar to a chapter of a DVD movie. 

World Wide Web Consortium (W3C), a multi-organizational entity and led by the founder of WWW Tim 
Berners-Lee itself, aims to define the recommendations and directions for long term developing the World Wide 
Web. For video, one of the key specifications (W3C Consortium, 2013) defines “media pieces” (media fragments) 
notion. The main purpose of the specification of video pieces is addressing some subsections from the inside of 
videos, similar to HTML anchors. These anchors defined by adding a # followed by the anchor name, are 
referring to a subsection of a current page annotated as such. The media fragments term describes a 
portion/segment of a media object (Li, Wald, Omitola, Shadbolt, & Wills, 2012); a fact also highlighted by the 
name fragments + media. An example is a fragment of 30 seconds of a video clip lasting for 2 minutes. In terms 
of their structure, media fragments have three main components: #, t, xywh; described as: 

• # - indicates that the preceding part is the physical location of the file, and the following is an excerpt from 
an image or a video and has two possible components - temporal and spatial dimension; 

• / - it is the temporal dimension (W3C 2013), can have two values that represent the beginning and the end 
of the fragment, in second. 

• xywh - it is the spatial dimension (W3C, 2013), has four values: the first two xy are the coordinates and wh 
representing the height and width of the defined fragment. 

So, to uniquely identify a portion of a video the addressing of the video materials by URI has the following 
structure: http://www.name.ro/videoname.mp4#t=t_start,t_end&xywh=5,10,640,480. The process described 
above, where for a given spatial region or video segment (media fragment) the video content creator can provide 
additional resources to other media materials embedded within the page, hyperlinks to external resources or 
information obtained based on the principles of Semantic Web (Wald, Omitola, Shadbolt, & Wills, 2012) is part 
of video annotations. To cover all cases that may arise, annotations can be classified into: 

• Conceptual-a video clip is available on a generic concept throughout the video; 

• Temporal-a particular idea, object or person, second occurs between f and t 2 second; 

• Spatial-A portion of the image has a definite meaning; 

• Temporal/spatial-combine spatial and temporal annotations; 

• Subject in motion-the subject of the annotation moves in the video frame. 

2.3 Major Players 

The interactive video platforms serve the education and entertainment domains mostly. They are built under a 
free license or commercial, project collaboration, company effort or individual’s approaches. 

We search on the Internet the main actors for generating online interactive materials and the results are reveal in 
the Table 1 (platforms from Australia and India), Table 2 (platforms from USA and Canada) and Table 3 
(platforms from Europe countries): 
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Table 1. Major players in interactive video from Australia and India 


Name 

Details 

Australia 

iWeaver 

It describes an interactive system that use text, images, audio and 
animations to explain a web programming course (Java 
Programming Language). It is Christian Wolf project, teacher of 
Faculty of Education, Language, and Community Services, 
RMIT University Melbourne, Australia (Wolf, 2002). 

India Pad.ma 

Public Access Digital Media Archive represents an online archive 
of text-annotated video material, primarily footage and not 
finished films. The Pad.ma project was initiated by a group 
consisting of CAMP, from Mumbai, 0x2620 from Berlin and the 
Alternative Law Forum from Bangalore. 


Table 2. Major players in interactive video from America and Canada 


Name 

Details 

HyperCafe 

It is a Georgia Institute of Technology, School of Literature, 
Communication and Culture College of Computing project. It 
represents a virtual cafe, composed primarily of digital video clips of 
actors involved in fictional conversations in the cafe (Sawhney, 
Balcom & Smith, 1996). 

NeXtream 

It is an MIT Media Lab project at Cambridge. It represents a 
framework and implementation for the next generation of media 
consumption and TV (Martin, Santos, Shafran, Holtzman & 
Montpetit, 2010). 

OVA 

Open Video Annotation it is an interdisciplinary initiative led by 
HarvardX and supported by the Center for Hellenic Studies at 
Harvard, Office of Scholarly Communications, and the Berkman 
Center for Internet and the Center for Mind Informatics, 
Massachusetts General Hospital. 

Remark 

It is a collaborative video annotation, review and approval of video 
teams, clients, and stakeholders. The headquarters is in Austin, TX. 

Youtube 

Annotations are clickable text overlays on YouTube videos. 

VARS 

Video Annotation and Reference System, was developed by Monterey 
Bay Aquarium Research Institute (MBARI) in 2011 for annotating 
deep-sea video data. 

Vatic 

Researchers from the University of California, Irvine developed a 
free, online, interactive video annotation tool for computer vision that 
crowdsources work to Amazon's Mechanical Turk (Vondrick, 
Patterson & Ramanan, 2013). 

VCode & VData 

It is the result of a Ph.D. work in Department of Computer Science, 
the University of Illinois at Urbana-Champaign. A suite of “open 
source” applications create a set of valid interfaces supporting the 
video annotation workflow (Hagedom, Hailpem & Karahalios, 2008) 

Vertov 

The project of the Concordia Digital History Lab Centre for Oral 
History and Digital Storytelling, Concordia University from Montreal 
offers a free media annotating plugin for Zotero. 

Video AnnEx 

Annotating video sequences with MPEG-7 metadata (Naphade, Lin, 
Smith, Tseng & Basu, 2002) was developed in IBM T. J. Watson 
Research Center. 
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Viddler 

It is an interactive HTML5 video player, a responsive one. The 
company began in 2005 as InteractiveTube when two Lehigh 
University students, Donna DeMarco and Rob Sandie, added 
interactive functionality to a TV program called Blues Clues. 


Video Ant 

University of Minnesota team offers a mobile and desktop video 
annotation. 

Wiremax 

It is a result of a company work, and now the headquarters are in New 
York, London, and Venice. Cloud-based, the end-to-end pipeline for 
video ingestion, decomposition, analysis and delivery with the ability 
to add advanced annotations in real-time are the facilities provided. 


Zaption 

A San Francisco-based tech startup offers an online annotation 
system, details can be found in the next subchapter. 


Table 3. Major players in interactive video from Europe 


Name 

Details 

Algeria 

CHM 

Component-based Hypervideo Model represents a practical framework 
that allows the design of Web-oriented hyper videos (Sadallah 
&Aubert, 2012). 

Austria 

ConnectME 

Connected Media Experiences was a research project of STI 
International, Salzburg Research, PS Media, and Yoovis GmbH. A 
framework and player act as a proof of concept for the 
semantics-based dynamic enrichment of videos based on Linked Data 
annotations (Nixon, Bauer & Bara, 2014). 

Belgium 

Zentrick 

It represents an online platform that drives measurable results for any 
video by introducing interactive elements that activate, engage and 
convert audiences. 

France 

Advene 

Annotate Digital Video Exchange on the Net is an ongoing project in 
the LIRIS Laboratory (UMR 5205 CNRS) at University Claude 
Bernard Lyon. It aims at providing a model and a format to share 
annotations about digital video documents (movies, courses and 
conferences), as well as tools to edit and visualize the hyper videos 
generated from both the annotations and the audiovisual documents 
(Aubert & Prie, 2005). 

Germany 

Anvil 

A professor from University Of Applied Sciences Augsburg, Germany 
offer an interactive media tool used in many research areas as 
human-computer interaction, linguistics, ethnology, anthropology, 
psychotherapy, embodied agents, computer animation and 
oceanography (Kipp, 2012). 

Israel 

Interlude 

It was founded by Israeli musician and self-proclaimed tech geek Yoni 
Bloch. It is targeted to music artists and personalities, as well as 
entertainment and consumer brands, to create interactive videos 
powered by Interlude technology. 

Netherlands 

ELAN 

It is software developed by The Language Archive, (Sloetjes & 
Wittenburg, 2008). The institute behind the project is Max Planck 
Institute for Psycholinguistics, The Language Archive and Nijmegen 
from Netherlands. 

Serbia 

VAT 

Video Annotation Tool was a Bohemie Research Project 
(Bootstrapping Ontology Evolution with Multimedia Information 
Extraction). VAT represents a tool to annotate MPEG video files 
manually and has already been successfully used for the annotation of 
sports events videos (Paliouras, Spyropoulos, & Tsatsaronis, 2011). 
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It has established a foothold in the market with broadcast household 
names. It is a portal extensible, future-proof, intelligent and easy to use 
Media Asset Management solution with multiple apps, including an 
Annotation Tool. 

Annotating Academic Video is a project resulted from the 
collaboration between Entwine team, Switch Net Services and 
Universitat Bern, UniBE, Padagogische Hochschule Zurich, PHZH, 
Universite de Lausanne - UniL, Padagogische Hochschule Thurgau, 
Switzerland PHTG and Universite de Fribourg, UniF. The project goal was to 

AAV create a standardized, open and flexible tool/framework to enable 

Swiss University faculty, the others implied schools, staff, and 
students to annotate video across a mix of platforms including players, 
video management and learning management systems (Entwine, 
2014). 


Sweden 

Cantemo 


2A Zaption Testing 

For testing and verification of some theoretical principles of annotations, we choose this platform because it 
focuses on development and use of interactive video in education, both in Romania and worldwide. Addressed to 
all kind of learning cycles, Zaption wants to determine teachers, educator, trainers and publishers of content to 
transform video materials in an interactive and engaging experience. To use Zaption application, it is necessary 
to create an account. We choose for a Pro Membership Plan, $ 89/year. After logging, we noticed that the 
application offers three components: tours, videos, and groups. Video part have features like video searching on 
Zaption databases, own materials administration, and video uploading. Group component allows creating a new 
group or enrollment in an existing group. The tour elements offer the same functionalities as video part (e.g. 
search, add, admin), but for a new user here it is the start of the annotation itself. The beginning of the tour 
concurs with the video upload: local one, videos from Zaption databases or from other video sharing platform: 
Vimeo, PBS, National Geographic, TED, Discovery, NASA, Edutopia, Vsauce, Crashcourses, Scishow, CGP 
Grey etc. 

The annotations are various: real-time drawing, creating an open response questions, creating a numerical 
answer questions, creating a multiple-choice questions, creating a cassette questions answered, create a reply to a 
question by drawing, discussions, users can ask questions or post comments, repeat video, skip a certain portion 
of the video. Position setting involves placing the annotation on the right or above the video frame. The behavior 
allows choosing between two actions: stop or play the video. Duration, in the name itself, sets a period during 
which the annotation will be displayed. After work finished, the final step includes publishing and sharing out 
the interactive video material. Once the tour is announced it can be a post to the Zaption Gallery for everyone to 
see it. Features like add a description and tag with, select one category and age level are permitted. From the 
technology profile and technical metadata perspective Zaption application was developed based on solutions 
from Table 7. 


Table 4. Technology profile for zaption 


Name 

Solution 

Nameserver 

Providers 

Amazon Route 53 

SSL Certificate 

GoDaddy SSL 

Hosting 

Providers 

Amazon 

Email Services 

Sendgrid, Google Apps for Business, SPF 

Analytics and 

Tracking 

Mixpanel, Google Analytics, Optimizely, New Relic 

JavaScript 

Libraries 

jQuerry, Skollr 
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Audio/Video 

Media 

Mobile 

Widgets 

Content Delivery 
Network 


VideoJS, Sublime Video, Youtube 

Apple Mobile Web Clips Icon, Viewport Meta 
Google Font API, Smart App Banner 

Cloud Front, Zencoder CDN, Ajax Libraries API, Vimeo CDN 


Document 

Information 

Encoding 

CSS Media 
Queries 

CDN Providers 


Cascading Style Sheets, Twitter Cards, HTML5 Specific Tags, Conditional Comments, 
Google Chrome IE Frame, Javascript, WAI-ARIA, HTML5 Video Audio Tags, Meta 
Description, HTML5 DocType, X-UA-Compatible, P3P Policy 

UTF-8 

Min Width, Max Width, Device Pixel Ratio 
Amazon Cloud Front 


Uideo 

ID 

Format 

Format/Info 

Format proFile 

Format settings, CABAC 

Format settings, ReFrames 

Codec ID 

Codec ID/InFo 

Duration 

Bit rate 

Maximum bit rate 

Width 

Height 

Display aspect ratio 
Frame rate mode 
Frame rate 
Color space 
Chroma subsampling 
Bit depth 
Scan type 

Bits/(Pixel*Frame) 

Stream size 


1 

AUC 

Advanced Uideo Codec 
Baseline@L3.0 
No 

1 Frame 
aucl 

Advanced Uideo Coding 

2mn 33s 

293 Kbps 

1 317 Kbps 

640 pixels 

360 pixels 

16:9 

Constant 

30.000 Fps 

VUU 

4:2:0 

8 bits 

Progressive 

0.042 

5.34 MiB (75%) 


Audio 

ID 

Format 
Format/Info 
Format proFile 
Codec ID 
Duration 
Bit rate mode 
Bit rate 

Maximum bit rate 
Channel(s) 

Channel positions 
Sampling rate 
Compression mode 
Stream size 


Figure 3. Zaption-video and audio technical metadata 


2 

ARC 

Advanced Audio Codec 

LC 

40 

2mn 33s 
Uariable 
96.0 Kbps 
103 Kbps 
2 channels 
Front: L R 
44.1 KHz 
Lossy 

1.75 MiB (25%) 


To identify technology profile listed in Table 6, we choose the BuiltWith application-http ://builtwith.com. It 
represents a tool for identification of technologies used in web applications. It is designed for a small group of 
users, including web developers and researchers. The generated results provide an overview of the complexity of 
technical parts required in the development process. Also, we analyzed the compression and encoding 
parameters characteristic of specific video content (audio video technical metadata) that plays inside the Zaption. 
We downloaded some video lessons (using Video DownloadHelper) on the local machine, and we extract the 
audio-video metadata information with Medialnfo® free software. Medialnfo has a simpler graphical interface, 
and it offers several different visualizations of the information that allow to user to determine what metadata are 
present. In Figure 3 are found values for the following components: Format profile: format/container, file size, 
duration; Audio parameters: audio codec, (maximum) bit rate, channel(s), sampling rate, compression mode, 
stream size; Video parameters: video codec, profile@level, settings (CABAC or CAVLC, GOP, M, N), video 
frame size, (maximum) bit rate, display aspect ratio, frame rate, color space, chroma subsampling, bit depth, scan 
type, Qf - bits/(pixel*frame), stream size (Onita et. al., 2015). 

3. Interactive Video Project Proposal 

3.1 Block Diagram for an Interactive Video Content 

We collected all the ideas from case studies done on the above subchapters; we watched the trends in MOOCs 
video platforms and metadata field, and we managed to extract the following thought: in designing a system that 
allows display of interactive video content we must take into consideration (Figure 4): video framework resource 
management (production, transcoding and storage); a module that allows actual annotation made by the 
administrator or content creator; an extension of the above module for (semi)automatic (semantic) annotation; 
distribution area, player interface; data analytics block, recommendation system. 
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Several devices provide the video production content: consumer, prosumer, professional and super chip Cameras; 
DSLR, mirrorless and compact photo-video camera; mobile gadgets. The video source may be also considered 
material generated with computer software support: simulation, computer-aided design or some video lectures 
(tutorial/demonstration, animated instructional video, voice over presentation). From the transcoding process 
results audio-video stream encoded in specific formats (with control of important coding parameters as shown in 
under construction interface revealed in Figure 5) and together with the master video is stored in a storage (cloud) 
system. 


File Uploader for automatic HTML5 video transcoding 

Video file: 

Choose File TAV-productie.mp4 

Video Size: 

640x360 (16:9) * 

Video Bitrate: 

700 kbit/s v 

Frame Rate: 

25fps v 

Deinterlace 

@ 

Enable Audio 

a 

Audio sampling rate (in Hz): 

44100 Mz » 

Audio bitrate (kbps): 

128 kbps 

Stereo 

<8> 

Mono 

o 

HTML5 Video Settings 

mp4-li264 (audio codec aac) 

a 

ogv (audio codec libvorbis) 

s 

webm 

B 

Capture stills 


-%■ Upload and convert 



Figure 5. HTML5 video transcoding 
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Table 5. Scenarios for audio-video parameters (Onita et al., 2015) 


Video frame size 

Audio bitrate* 

Codec Audio 

Video bit rate** 

Codec video 

Container 

240p,426 x 240 

196 kbps 
128kbps 

MP3 

Vorbis 

700 Kbps 

400 Kbps 

H.264, VP8 

.mp4, .webm 


64kbps 

300 Kbps 



360p,640 x 360 

196 kbps 
128kbps 

AAC-LC 

Vorbis 

1000 Kbps 

750 Kbps 

H.264, VP8 

.mp4, .webm 


64kbps 

400 Kbps 



720p, 1280x720 

512kbps 

384kbps 

AAC-LC 

Vorbis 

4000 Kbps 
2500 Kbps 

H.264, VP8 

.mp4, .webm 


128kbps 

1500 Kbps 




*5.1, stereo, mono. 

**maximum, recommended, minimum. 


The content annotation can be done manually by the editor/manufacturer (Figure 6) that adds initial connections 
to available entities in LOD (Linked Open Data). According to W3C Consortium (2014), Linked Data appears in 
the Semantic Web context. “The Semantic Web is a Web of Data - of dates and titles and part numbers and 
chemical properties and any other data one might conceive of. The collection of Semantic Web technologies 
(RDF, OWL, SKOS, SPARQL, etc.) provides an environment where the application can query that data, draw 
inferences using vocabularies, etc. However, to make the Web of Data a reality, it is important to have the vast 
amount of data on the Web available in a standard format, reachable and manageable by Semantic Web tools. 
Furthermore, not only does the Semantic Web need access to data, but relationships among data should be made 
available, too, to create a Web of Data (as opposed to a sheer collection of datasets). This collection of 
interrelated datasets on the Web can also be referred to as Linked Open Data”. 


Fragment Annotator/Decision producer 



Save ilus selection'’ 


Add Annotation 

Annotation for entire video 
Time Interval Start: 0 sec | End: sec 

DataURI 
Annotation Type 
Title: 

Description: 

Spatial 

coordinates percent: 37,28,34,60 
xywh 

Category: 

Tags: 

Explicit? * | Clickable? i 

Daiieahla ■/ I Dithlicharl ✓ 

List of existing Decisions 


| Pauses Movie? a 


Figure 6. Spatial annotation-module proposal 


Another annotation possibility is a semi-automatic one, in which case, it is applied NLP (Natural Language 
Processing) techniques to audio streams and video frames recognition. The system suggests the initial annotation 
and the editor approve them. To extract this kind of data from LOD one can use Crawling Pattern, where several 
data sources are covered for obtaining the necessary information. LDIF (Linked Data Integration Framework) 
and Apache Marmota, LMF (Linked Media Framework) part, represent solutions for this process. The data 
obtained is stored in a particular Triplestore, with Opened Sesame software solution. The database block is 
supplemented with a relational database that keeps data about video (meta)information and about user interaction 
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with the video itself. Data analytics and a recommendation system complete the landscape. The interactive 
player module displays the video itself with the annotations information delivered from the database, and it can 
run under different scenarios. We offer some particularities on this subject in the next subchapter. 

3.2 Requirements of Annotation Interface and Player Module 

From the organizational point of view there are two aspects: on one hand there are the creator and publisher of 
the material (the “manufacturer” of annotations), on the contrary it is the one who is viewing the content. This 
leads to two “sides of usability and requirements” of such implementations. In the first case, the interactive part 
should be possible in digital video file formats via standalone or web-based interfaces. Regardless of the chosen 
option, the annotations system should allow all types of annotations described in Table 3, part of them or distinct 
combinations between them. In the Harvard (2014) proposal, the user interface should permit (Figure 7): 
comment or annotation of the entire video piece (1); overlays of images or videos on top of sound visualization 
graph (2); text associated with an annotation and overlays of user-generated graphics (3); user-defined custom 
annotation markers (4); automatic captioning of spoken word, used as baseline annotations-transcript (5); a time 
range (beginning and ending time) for each annotation (6). 



Figure 7. Requirements of a video annotation scene 


The overlays area (3 in Figure 7) has some features (like the Zaption ones): 

• Graphics for shapes such as lines, rectangles, ellipses, polygons and freehand shapes; 

• Stroke and fill properties of graphic objects should be user defined, including color, stroke weight, and 
transparency; 

• Short text tags with custom font weight, size, and color, including an optional highlight color and 
customizable background color for greater contrast; 

• Placement and positioning of graphic objects should be dependent on the graphed waveform and zooming 
level. 

In the second case, viewing and running the annotated video content involve an appropriate video, player. There 
are two display modes: window and full screen (Figure 8: MGI - Manually Generated Information, AGI - 
Automatic Generated Information). The first case belongs to web approach, where the video content appears on a 
part of the screen and the additional information generated by annotations is displayed in adjacent sections, not 
overlapping video. The full-screen case is met on smart TV-IPTV, where the adjacent information is displayed in 
the video. The player can be thought adaptable to different resolutions (scalable version, as Viddler model), in 
which case the supplementary information must be reduced in a number of blocks display. 
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MGI1 1 


MGI 1_2 


MGI 1 3 


MGI _4 


MGI 2_1 


MGI 2_2 


Figure 8. Video player interface, regular window and full screen 


The graphical interface will have a significant role; the literature provides a list of required components (IAB 
Digital, 2014), (Sadallah & Aubert, 2012) for a playing module that can add value in terms of usability (Meixner, 
Holbling, Stegmaier, & Kosch, 2009), (Meixner, Siegel, Schultes, Lehner, & Kosch, 2013): 

• Custom video player with customizable buttons, playback controls and additional buttons (mute, full screen, 
annotations on/off, subtitles on/off, etc.); 

• Graphics and animations overlay, geometric forms, animation elements like arrow, highlighted zone that 
drew attention to the parts of video frame content; 

• Hotspots, interactive regions with external hyperlinks; 

• Overlay video subtitle; 

• A timeline with the video annotations; 

• Table of content; 

• A key visual frame map with static image/frame from the video; 

• Downloadable transcript. 

At the interaction with the so-called hotspots containing hyperlinks, video playback must be turned off to follow 
that link; the resulting information can be incorporated into the hyper-video page or displayed in a new window, 
depending on the selected display mode. The components related to the chronological structure of the video 
(contents, map, timeline, transcribed text etc.) will not stop running the video and will enable the leap in the 
specified location. Based on MOOCs video pages and our experience in the field, we think that some additional 
components for such an interface can be underlined: FAQ zone, additional information area (e.g., authors, links 
to other materials on the same subject), multi-clip index, self-assessment quiz, homework proposals and 
interactive transcript (the tutor text speech which is synchronized with the video itself and with the help of 
keywords to allow the jump to the particular area of the material). 

3.3 Managerial Implications 

The performance of a University is reflected in several ways. The institute must propose itself quality and 
novelty. It can offer these in the educational act, with modern learning platforms and by teaching and 
management staff. Our study can have practical implications for the Politehnica University of Timisoara, The 
study indicates the necessity of a new and challenging platform based on the video (interactivity). It is a process 
that needs time and members to work together as a team for achieving the tasks and goals. A functional alpha 
version needs approximately one year of implementation and specific equipment/software in a different stage of 
implementation. In Table 6 we provide shorts managerial implications for each module proposed in this chapter. 
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Table 6. Managerial implications for alpha version of our interactive project proposal 

Module 

Details 


Equipment/Software: DSLR cameras or video professional cameras, editing software, 

Production 

video graphics station 

Number of developers: 1, Time: 3 months, 30 hours, Cost: 1000 Euros 

Transcoding 

Equipment/Software: VM1 (virtual machine) with FFMPEG and required libraries 
Number of developers: 1, Time: 3 months, 50 Hours, Cost: 600 Euros 

Storage 

Equipment/Software: Storage server with 1TB available space iSCSI or NFS 

Number of developers: 1, Time: 3 months, 50 Hours, Cost: 600 Euros 

Manual annotation 

Equipment/Software: VM2 configured as a web server, VM3 link data storage 

Number of developers: 2, Time: 1 month, 120 Hours, Cost: 1200 Euros 


Equipment/Software: VM4 for image and sound processing, VM5 for automatic 

Automatic annotation 

annotation, VM6 link data storage 

Number of developers: 2, Time: 3 months, 150 Hours, Cost: 1500 Euros 

Distribution area 

Equipment/Software: VM2 configured as a web server 

Number of developers: 1, Time: 3 months, 50 Hours, Cost: 500 Euros 

Player interface 

Equipment/Software: VM2 configured as a web server 

Number of developers: 1, Time: 3 months, 100 Hours, Cost: 1000 Euros 


Equipment/Software: VM7 for running DM algorithms (possible solutions skit-leam 

Data analytics block 

library for Python or Weka for JAVA) 

Number of developers: 2, Time: 3 months, 100 hours, Cost: 1000 Euros 

Recommendation 

system 

Equipment/Software: VM7 for running DM algorithms (possible solutions skit-leam 
library for Python or Weka for JAVA) 

Number of developers: 2, Time: 3 months, 150 Hours, Cost: 1500 Euros 


The time and the cost have approximated values. The developer hour’s rating is 10 Euros / Hour, 1600 Euro / 
Full Month. The Virtual Machines can be built on a single server machine. 

4. Conclusions 

The flood of popularity that accompanies video-based platforms generates more and more research opportunities. 
Most of this research is focused on the impact of video on Higher Education, on a new platform that offers not 
only a passive learning experience but way more. The video material must represent a plethora of useful 
information according to user wishes, followed either online, on a TV connected to the Internet or a mobile 
device, generating a pattern of active learning. It is the way to collaborative activities, quizzing and broad 
information search, effort to gain the student attention and clear objectives for learning. In terms of technology, it 
is the right time for interactive video, for combining elements as video production, transcoding, streaming, 
semantic, recommendations systems, data analytics, etc. The time has come for the educational platform to take 
into consideration interactivity as a “necessary”. It is the moment when universities all over the world have to 
invest time, money and effort to improve their online video education platforms or to develop from scratch a new 
challenging video interactive platform. It is our idea regarding Politehnica University Timisoara, as a standalone 
solution or a complement to actual virtual campus (http://cv.upt.ro) depending on future development plans and 
financial aspects. 

In order to give a proposal closer to current world needs, in this paper we highlighted: the existing educational 
infrastructure locally and internationally (MOOCs movement, virtual campuses for universities); the concept and 
the progress of video as central part for e-Learning; essential elements from technical point of view; related 
project in the field; platform testing (free ones and some commercial ones). 

We collected all ideas and we propose a web-based system as shown in modules from Figure 4: video production, 
transcoding, storage, manual and (semi)automatic annotations, LOD, interactive player, databases and 
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intemal/extemal resources, data analytics and recommendation system. Blocks like transcoding, spatial 
annotation, and the interactive player was partially built by Sorin Petan for his final Ph.D. thesis, but all these 
pieces were put together in a discussion group in January 2015. A concrete, stable version will require at least 
one year of work and a team of specialists in the web developing, Semantic Web, audio video streaming, and 
transcoding, as a minimum start. We want to present this framework to Politehnica University Timisoara Board, 
to bring together the needed specialist and to work on a real scenario until the beginning of the next year, to 
gather feedback from our students, identify possible problems and improve user interface and overall usability. 
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